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"Tj- 1. Introduction 

OO 

Qf^ KuUback-Leibler (KL) divergence (relative entropy) can be considered as a measure 

f — of the difference/ dissimilarity between sources. Estimating KL divergence from 

finite realizations of a stochastic process with unknown memory is a long-standing 

problem, with interesting mathematical aspects and useful applications to automatic 

>■ categorization of symbolic sequences. Namely, an empirical estimation of the divergence 

S^ can be used to classify sequences (for approaches to this problem using other methods, 

j^ in particular true metric distances, see |10] . [12] : see also [T]). 

In [TE] Ziv and Merhav showed how to estimate the KL divergence between two 
sources, using the parsing scheme of LZ77 algorithm |15) on two finite length realizations. 
They proved the consistence of the method by showing that the estimate of the 
divergence for two markovian sources converges to their relative entropy when the length 
of the sequences diverges. Furthermore they proposed this estimator as a tool for an 
"universal classification" of sequences. 

A procedure based on the implementations of LZ77 algorithm (gzip, winzip) is 
proposed in |3]. The estimate obtained of the relative entropy is then used to construct 
phylogenetic trees for languages and is proposed as a tool to solve authorship attribution 
problems. Moreover, the relation between the relative entropy and the estimate given 
by this procedure is analyzed in |13| . 

Two different algorithms are proposed and analyzed in [5j, see also [6j. The first 
one is based on the Burrows- Wheeler block sorting transform [4j, while the other 



O 
O 
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uses the Context Tree Weighting method. The authors proved the consistence of 
these approximation methods and show that these methods outperform the others in 
experiments. 

In [2] it is shown how to construct an entropy estimator for stationary ergodic 
stochastic sources using non-sequential recursive pairs substitutions method, introduced 
in [7] (see also [9j and references therein for similar approaches). 

In this paper we want to discuss the use of similar techniques to construct an 
estimator of relative (and cross) entropy between a pair of stochastic sources. In 
particular we investigate how the asymptotic properties of concurrent pair substitutions 
might be used to construct an optimal (in the sense of convergence) relative entropy 
estimator. A second relevant question arises about the computational efficiency of the 
derived indicator. While here we address the first, mostly mathematical, question, we 
leave the computational and applicative aspects for forthcoming research. 

The paper is structured as follows: in section [2] we state the notations, in section 
[3] we describe the details of the non-sequential recursive pair substitutions (NSRPS) 
method, in section |4] we prove that NSRPS preserve the cross and the relative entropy, 
in section |5] we prove the main result: we can obtain an estimate of the relative entropy 
by calculating the 1-block relative entropy of the sequences we obtain using the NSRPS 
method. 

2. Definitions and notations 

We introduce here the main definitions and notations, often following the formalism 
used in [2^. Given a finite alphabet A, we denote with A* = Uk>iA'' the set of finite 
words. Given a word 00 G A"", we denote by \uj\ = n its length and if 1 < i < j < n 
and u = {ui,U2, ■ ■ ■ ,ujn), we use u^ to indicate the subword u^ = {ui, . . . ,Uj). We 
use similar notations for one-sided infinite (elements of A^') or double infinite words 
(elements of A''). Often sequences will be seen as finite or infinite realizations of discrete- 
time stochastic stationary, ergodic processes of a random variable X with values in A. 
The n-th order joint distributions fin identify the process and its elements follow 
the consistency conditions: 

When no confusion will arise, the subscript n will be omitted, and we will just use fi{u}i) 
to denote both the measure of the cylinder and the probability of the finite word. 

Equivalently, a distribution of a process can also be defined by specifying the initial 
one-character distribution fii and the successive conditional distributions: 

f^[UJn\UJ^ ) - 7-^^- 

Given an ergodic, stationary stochastic source we define as usual: 
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n-block entropy 

i/„(/i) := - ^ fi{uj) log n{uj). 

\uj\=n 

n-conditional entropy 

hnin) := Hn+i{n) - Hn{n) = ^ /^«a) logyu(a|u;") := E^^^^ {log n{a\u^)) , 

where a;"a denotes the concatenated word {ui, U2, ■ ■ ■ , oJn-, a) and E^^ (•) is just the 
process average. 

Entropy of fi 

h{jj,) := hm " = hm /i„(;u) = E^ (log/i(a|w^)) 

The following properties and results are very well known [14j, but at the same time 
quite important for the proofs and the techniques developed here (and also in |2]): 

• h{fi) < ... < hkifi) < hk-i{fi) < ... < hi{fi) < Hi{fi). 

• A process /i is /c-Markov if and only if h{fi) = hk{fi). 

• Entropy Theorem: for almost all realizations of the process, we have 

h{fi) = lim log fi{x^), /i — a.s. 

In this paper we focus on properties involving pairs of stochastic sources on the 
same alphabet with distributions /i and u, namely cross entropy and the related relative 
entropy (or Kullback Leibler divergence): 
n-conditional cross entropy 

/in(yu||z/) = - ^ fx{uj a) log u{a\uj), (2.1) 

uieA''^,aeA 

cross entropy 

hi^i\\iy)= lim hMW), (2.2) 

n— >+oo 

relative entropy (Kullback- Leibler divergence) 

d{f.\W)=limEJlog^p^ 

= lira J: Ku^^)log^p^^. (2.3) 

Note that 

/i(/i||z/) = h{fi) + d{fi\\iy) 

Moreover we stress that, if u is k-Markov then, for any n 

hif^Wu) = hifiWi.) (2.4) 
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Namely /i/(/i||z/) = hk{fi\\i') for any / > k: 

= -J2c.eA'-^beA^aeAK^ba)logu{a\b) 
= -E6eA^aeA/^(Hlog'^(«l^) = hk{fi\W) 

Note that /ii(/i||z/) depends only on the two-symbol distributfon of /i. 

Entropy and cross entropy can be related to the asymptotic behavior of properly 
defined returning times and waiting times, respectively. More precisely, given an ergodic, 
stationary process /i, a sample sequence w = Wi,W2,--- and ra > 1, we define the 
returning time of the first n characters as: 

returning time Rip^) = min{fc > 1 : w^'*'""^ ~ ^"} (2-5) 

Similarly, given two realizations w = wi, . . . , Wn, ■ ■ ■ and z = zi, . . . , Zn, ■ ■ ■ of /i and u 
respectively, we define the 

waiting time W^w"^, z) = min{A; > 1 : ;z^^"~"^ = w"} (2.6) 

Obviously W{wi,w) = R{w'^). 

We now have the following two important results: 

Theorem 2.1 (Entropy and returning time [11]) If ^ is a stationary, ergodic 
process, then 

lim — logi?(w") = /i(/i) fi—a.s. 

n—>oo n 

Theorem 2.2 (Relative entropy and waiting time [8J) // ft is stationary and 
ergodic, v is k-Markov and the marginals fin of /i are dominated by the corresponding 
marginals Un ofu, i.e. fj,n « Vn, then 

lim —\ogW{w'^,z) = h{fi) + rf(/i||z/) = /i(/i||z/), (/i x u) — a.s. 

n—>oo n 

3. Non sequential recursive pair substitutions 

We now introduce a family of transformations on sequences and the corresponding 
operators on distributions: given a,b ^ A (including a = b), a ^ A and A' = AU {a}, 
a pair substitution is a map G°^ : A* ^>- A* which substitutes sequentially, from left to 
right, the occurrences of ab with a. For example 

G^i (0010001011100100) = 020022110200. 



or: 



G^o(OOOlOOOOll) = 2012211. 



G = G^f, is always an injective but not surjective map that can be immediately extended 
also to infinite sequences w & A'*. 
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The action of G shorten the original sequence: we denote by Z the inverse of the 

contraction rate: 

1 __ |G^,K)I _^_HabCujn 



For fi-typical sequences we can pass to the hmit and define: 

J^ _ IGK)! _ / l-fi{ab) iia^b 

Z^ n^oo \ujf\ I 1 — yu(aa) + /i(aaa) — yu(aaaa) + ■ ■ ■ if a = 6 

An important remark is that if we start from a source where admissible words are 
described by constraints on consecutive symbols, this property will remain true even 
after an arbitrary pair substitution. In other words (see Theorem 2.1 in [2jj): a pair 
substitution maps pair constraints in pair constraints. 

A pair substitution G"^ naturally induces a map on the set of ergodic stationary 
measures on A^ by mapping typical sequences w.r.t. the original measure /i in typical 
sequences w.r.t. the transformed measure Qfi: given zf G A'* then (Theorem 2.2 in [2]) 

n-s>oo |G(a;"j| 

exists and is constant fi almost everywhere in w G A^"*, moreover {QfJ'{z)}zeA'* are the 
marginals of an ergodic measure on A'^. 

Again in |2] , the following results are proved showing how entropies transform 
under the action oi Q = Q^^,-, with expanding factor Z = Z'^^: 
Invariance of entropy 

h{Q^) = Z h{fi). 

Decreasing of the 1- conditional entropy 

hi{Qii) < Zhi{fi). 
Moreover, Q maps 1-Markov measures in 1-Markov measures. In fact: 

HGfi) < hiiQfi) < Zhiifi) = Zh{fi) = hiGfi) 
Decreasing of the k- conditional entropy 

hkiGfJ') < Zhkifx). 

Moreover Q maps k-Maikov measures in /c-Markov measures. 

While later on we will give another proof of the first fact, we remark that this 
property, together with the decrease of the 1-conditional entropy, reflect, roughly 
speaking, the fact that the amount of information of G{uj) , which is equal to that 
of u, is more concentrated on the pairs of consecutive symbols. 

As we are interested in sequences of recursive pair substitutions, we assume to start 
with an initial alphabet A and define an increasing alphabet sequence Ai, A2, ... An, 
.... Given A^ and chosen a^,, bj^ & ^at-i (not necessarily different): 
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we indicate with a^v ^ ^Af~i a new symbol and define the new alphabet as 

An = An-1 U {ajv}; 

we denote with Gn the substitution map Gn = G'^^f^^ : A*j^_i — )■ A*j^ which 

substitutes whit a^y the occurrences of the pair ajv&jv in the strings on the alphabet 

An-i', 

we denote with Qn the corresponding map from the measures on A'ff_i to the 

measures on Afj] 

we define by Zn the corresponding normalization factor Zn = ^"^^ . 
We use the over-line to denote iterated quantities: 

Gjv := Gjv o Gjv-i o ■ ■ ■ o Gi, Qff := ^jv ° Gn-i o ■ ■ ■ o Qi 

and also 

Zpf = Zj^Zpf-i ■ ■ ■ Z\. 

The asymptotic properties of Z^ clearly depend on the pairs chosen in the 
substitutions. In particular, if at any step N the chosen pair a^^h^^ is the pair of maximum 
of frequency of A^^-x then (Theorem 4.1 in ^): 

lim Zjv = +C)0 

AT-^oo 

Regarding the asymptotic properties of the entropy we have the following theorem 
that rigorously show that /ijv := G^^ becomes asymptotically 1-Markov: 

Theorem 3.1 (Entropy via NSRPS [2J) // 

lim Zjv = +C)0 

Af-s>oo 

then 

h{fi) = lim -^hiifi^) 

The main results of this paper is the generalization of this theorem to the cross and 
relative entropy. 

Before entering in the details of our construction let us sketch here the main steps. 

In particular let us consider the cross entropy (the same argument will apply to the 
relative entropy) of the measure fi with respect to the measure u: i.e. /i(/i||z/). 

As we will show, but for the normalization factor Z ^, this is equal to the cross 
entropy of the measure Gat/x w.r.t the measure Gatz/: 

Hf^lW) = — ^ 



z 



N 



Moreover, as we have seen above , if we choose the substitution in a suitable way (for 
instance if at any step we substitute the pair with maximum frequency) then Z^ — )■ cxd 
and the measure Gni^ becomes asymptotically 1-Markov as A^ — t- oo. 
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Interestingly, we do not know if Z^ also diverges (we will discuss this point in the 
sequel) . 

Nevertheless, noticing that the cross entropy of a 1-Markov source w.r.t a generic 
ergodic source is equal to the 1-Markov cross entropy between the two sources, it is 
reasonable to expect that the cross entropy /i(/i||i/) can be obtained as the following 
limit: 



h{fj.\\u) 



lim 



hl{GNfJ'\\GNl^) 



This is exactly what we will prove in the two next sections. 



4. Scaling of (relative) entropy via waiting times 

We first show how the relative entropy between two stochastic process fx and u scales 
after acting with the same pair substitution on both sources to have Qfi and Qu. More 



precisely we make use of Theorem 2.2 and have the following: 

Theorem 4.1 (Invariance of relative entropy for pair substitution) If ^ is er- 
godic, u is a Markov chain and fin << Vn, then if G is a pair substitution 

d{gfi\\gu) = z^d{fi\\u) 

Proof. To fix the notations, let us denote by w and z the infinite realizations of 
the process of measure n and z/ respectively, and by w^ and z^ the corresponding finite 
substrings. Let us denote by a, 6 G ^ the characters involved in the pair substitution 
G = G°^. Moreover let us denote the waiting time with the shorter notation: 



6tj 



W{w'l,z). 



We now explore how the waiting time rescale with respect to the transformation G: we 
consider the first time we see the sequence G{wi) inside the sequence G{z). To start 
with, we assume that wi ^ b as we can always consider Th. 2.2| for realizations with a 
fixed prefix of positive probability. Moreover we choose a subsequence {rij} such that 
rii is the smallest n > rii-i such that w^ y^ a- Of course rij — )■ oo as z — )► oo. In this 
case, it is easy to observe that 

W{G{wT),G{z)) = \G{wl-')\ 

Then, using Theorem |2.2| 

h{g^^\\gly) = _,lim_ ^,^log[iy(G«),G(^))] = 



|G«) 



Hi 



ATcoIgkoI 



iog|G(wi"*; 



rii 



Ui 



\0g{tr. 



+ —log 

rii 



\G{wn\ 



Z'h{fi\\v) 



(4.7) 
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where in the last step we used the fact that t„. — ?■ oo as z — )■ oo, the definition of Z^ and 



Theorem 2.2 for fi and z/. Note that for /i = z/, equation (4.7) reproduces the content of 
Theorem 3.1 of pj: 

hiGfi) = Z^hifi), 

that thus iniphes 

d{gfi\\giy) = z^d{fi\\u). 



Note that the hniit in Th. 2.2| is almost surely unique and then the initial restrictive 



assumption Wi ^ h and the use of the subsequence n, have no consequences on the 
thesis; this concludes the proof. 

D 

Before discussing the convergence of relative entropy under successive substitutions 



we go thorough a simple explicit example of the Theorem 4.1, in order to show the 
difficulties we deal with, when we try to use the explicit expressions of the transformed 
measures we find in |2]. 

Example. We treat here the most simple case: /x and v are Bernoulli binary processes 
with parameters /i(0),/i(l) and z/(0),z/(l) respectively. We consider the substitution 
G = Gqi given by 01 -^ 2. It is long but easy to verify that Qfi is a stationary, ergodic, 
1-Markov with equilibrium state 

QfiiO) = Zfi{00), ^/i(l) = Z(/i(l) - n{01)),gfi{2) = Z/i(01) , 

where Z = Z^(Ol) = (1 - Ai(Ol))-^ 

For example, given a ^/x-generic sequence yi, . . . ,ym, corresponding to a /i-generic 
sequence xi, . . . , x„ {y = Gx): 

gii{Q) = hm -tl{0 G yf} 

_ . ^ tt{0ex^}-tt{01 g^i) 
n-s>oo m n 

Tl 

= im - /^(Ol)) • Jim ^ _ ^^Q^ ^ ^„^ = Z{m - /.(Ol)) = Z^(OO) 

Clearly: 

^/i(0) + ^/i(l) + ^/i(2) = 1 
Using the same argument as before, it is now possible to write down the probability 
distribution of pair of characters for Q^. Again the following holds for a generic process: 

^^ = fiiOO) - m(001) = m(OOO) ^^ = ^^ = At(OOl) 

2^ = Ai(lO) - m(010) - Ai(lOl) + A*(0101) ^^ = Ai(ll) - Ai(Oll) ^^ = a*(101) - /^(OlOl) 

^^ = Ai(oio) - m(oioi) ^^ = ^IiOll) ^^ = ^^(0101) 

It is easy to see that Ylx v=o i 2 Sl^{xy) = 1. Now we can write the transition matrix 
P for the process Qfi as P{y\x) = Gfi{xy)/Qfi{x): 

P(0|0) P(1|0) P(2|0) \ 
P= I P(0|1) P(l|l) P(2|l) 

P(0|2) P(l|2) P(2|2) J 
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/ /i(0) ^(1) \ 



P 



/i(00) /i(l) ^(01) 
y /i(00) /i(l) /i(01) J 



We now denote with Q the transition matrix for Qu. For the two 1-Markov 
processes, we have 



digii\\gu)= J2 M^) I] Piy\x)hg 



x=0,l,2 



S/=0,l,2 



Q(y|x) 



Via straightforward calculations, using the product structure of the measure /i: 
d{Gij\\Gu) = Z/i(00) 



M0)iog4S + Mi)iog^^'^ 



Ko) 



Kl) 



+Z/X(11) 

+ZAi(01) 



MOO) log 4^ + Ml) log 4^ + Moi) log ^^''^ 



1/(00) 

Moo)iog4!!!!T+Mi)iog 



U{1) 

Ml) 



+ MOI) log 



MOO) log ^ + Ml) log 



1/(00) 

Z/i(oo)c;(;u||z/) + Z/i(i) 

Z/i(00)rf(/i||z/) + Zfi{l) [fi{0)d{fi\\u) + d{ij\\iy)] 

Zd{fi\\u){ij{00) + ij{10) + nil)) 

Zd{^i\\v) 



^(01) 
/i(01) 
^(01) 
Ml) 



Kl) 



+ MOI) log 



/i(01) ' 
^(01). 



5. The convergence 

We now prove that the renormalized 1-Markov cross entropy between /i„ and z/„ 
converges to the cross-entropy between G^/i and G^z/ as the number of pair substitution 
n goes to oo.. 

More precisely: 

Theorem 5.1 (KL divergence via NSRPS) If Z"^ — > +oo as N ^ +oo, 

hl{QN^\\QNl') 



M/xlk) 



lim 



Proof. Let us define, as in [2] the following operators on the ergodic measures: V is 
the projection operator that maps a measure to its 1-Markov approximation, whereas 
Vm is the operator such that for any arbitrary v 

We notice (see [2j for the details) that the normalization constant for V^v is the same 
of that for v. 
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The measure V^i^ is not 1-Markov, but we know that it becomes 1-Markov after A^ 
steps of substitutions, in fact it becomes VGj^v. Moreover, as discussed in [2], it is an 
approximation of v if Z ^^ diverges: for any u of length /c, 

\V^v{uj)-v{uj)\<2=^ (5.8) 

Now it is easy to estabhsh the following chain of equalities: 

^iV ^N ^N 

where we have used the conservation of the cross entropy h and the fact that if(7r||^) = 



/ii(7r||,^) if ^ are 1-Markov, as shown in eq. 2.4 To conclude the proof we have to show 
that 

This is an easy consequence of eq. |5.8| the definition |2.1 and eq. 2^ D 



6. Conclusions and remarks 

It is important to remark that we are not assuming the divergence of Z'^ too, as not 
being necessary for the convergence to the (rescaled) two-characters relative entropy. 

Nevertheless, it would be interesting to understand both the topological and 
statistical constraints that prevent or permit the divergence of the expanding factor 
Z'^. Experimentally, it seems that if we start with two measures with finite relative 
entropy (i.e. with absolutely continuous marginals), then if we choose the standard 
strategy (most frequent pair substitution) for the sequence of pair substitutions that 
yields the divergence of Z'^, we also simultaneously obtain the divergence of Z^ (see for 
instance fig. IT|. 

On the other hand, it seems possible to consider particular sources and particular 
strategies of pairs substitutions withdiverging Z^, that prevent the divergence of Z^. At 
this moment we do not have conclusive rigorous mathematical results on this subject. 



Finally, let us note that Th. 5.1 do not give directly an algorithm to estimate 
the relative entropy: in any implementation we would have to specify the "optimal" 
number of pairs substitutions, with respect to the length of initial sequences and also 
with respect to the dimension of the initial alphabet. Namely, in the estimate we have to 
take into account at least two correction terms, which diverges with A^: the entropy cost 
of writing the substitutions and the entropy cost of writing the frequencies of the pairs 
of characters in the alphabet we obtain after the substitutions (or equivalent quantities 
if we use, for instance, arithmetic codings modeling the two character frequencies). 

For what concerns possible implementations of the method it is important to notice 
that the NSRPS procedure can be implemented in linear time [9]. Therefore it seems 
reasonable that reasonably fast algorithms to compute relative entropy via NSRPS 
can be designed. Anyway, preliminary numerical experiments show that for sources of 
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11 




10^ 



10'' 



Figure 1. Z'^ and Z'^ for realizations of 10^ characters of two Markov process on 
{0, 1} with memory of length 5. The pairs are chosen with the standard strategy 
(most frequent pair substitution) for the sequences of starting measure v. 



finite memory this method seems to have the same limitations of that based on parsing 
procedures, with respect to the methods based on the analysis of context introduced in 
[5]. 

In fig. |2] we show the convergence of the estimates of the entropies of the two 



sources and of the cross entropy, given Th. 5.1, for two Markov process of memory 5. 
In this case, the numbers of substitutions A^ = 20 is small with respect to the length of 
the sequences 10®, then the correction terms are negligible. 

Let us finally note that the cross entropy estimate might show large variations for 
particular values of A^. This could be interpreted by the fact that for these values of 
A^ pairs with particular relevance for one source with respect to the other have been 
substituted. This example suggest that the NSRPS method for the estimation of the 
cross entropy should be useful in sequences analysis, for example in order to detect 
strings with a peculiar statistical role. 



References 

[1] C. Basile. D. Benedetto, E. Caglioti, M. Degli Esposti: An example of mathematical authorship 

attribution. J. Math. Phys. 41 125211 (2008) doi:10.1063/1.29965079 
[2] D. Benedetto, E. Caglioti, D. Gabrielli: Non-sequential recursive pair substitution: some rigorous 

results. Jour. Stat. Mech. Theo. Exp. ISSN: 1742-5468 (on line) 09 pp. 1-21 doi:10. 1088/1742.- 

5468/2006/09/P09011 (2006) 
[3] D. Benedetto, E. Caglioti, V. Loreto: Language Trees and Zipping, Physical Review Letters 88 4 

(2002) 
[4] Burrows, M., and Wheeler D. J.: A block sorting lossless data compression algorithm. Tech. Rep. 

124. Digital Equipment Corporation, Palo Alto, Calif. (1994). 



Relative entropy via non- sequential recursive pair substitution 



12 




10 



15 



20 
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obtained after N pairs substitutions. The dashed lines are the corresponding analytic 
value. 



[6; 

[7; 

[9 

[lo: 

[11 
[12: 

[13; 

[14 

[is: 
[16: 



H. Cai, S. R. Kulkarni, S. Verdu: Universal Divergence Estimation for Finite-Alphabet Sources. 

IEEE Trans, inf. theory 52 no. 8 (2006) 
H. Cai, S. Kulkarni and S. Verdu: Universal Estimation of Entropy via Block SoitingJEEE Trans. 

Information Theory, 50. no. 7 (2004). 
P. Grassberger: Data compression and entropy estimates by non-sequential recursive pair 

substitution, ArXiv:physics / 0207023 
I. Kontoyiannis: Asymptotic recurrence and waiting times for stationary processes. Journal of 

Thoretical Probability 11 (1998), pp. 795-811. 
N. Jesper Larsson and A. Moffat: Off-Line Dictionary Based Compression. IEEE Transactions on 

Information Theory 88 no. 11 pp. 1722-1732 (2000). 
M. Li, X. Chen, X. Li, B. Ma, P. Vitanyi: The similarity metric. IEEE Trans. Inf. Theory 50 no. 

12 pp. 3250-3264 (2004). 
D.S. Ornstein, B.Weiss: Entropy and data compression schemes. IEEE Transactions on 

Information Theory 39, 1 (1993), pp. 78-83. 
H.H. Otu, K. Sayood: A new sequence distance measure for phylogenetic tree construction. 

Bioinformatics 19, 16 (2003) 
A. Puglisi, D. Benedetto, E. Caglioti, V. Loreto, A. Vulpiani: Data Compression and Learning in 

Time Sequences Analysis. Pysica D 180 pp. 92-107 (2003) 
P.C. Shields: The Ergodic Theory of Discrete Sample Paths. American Mathematical Society 

(1996). 
J. Ziv, A. Lempel: A Universal Algorithm for Sequential Data Compression. IEEE Transactions 

on Information Theory 23, 3 (1977), pp. 337-343. 
J. Ziv, N. Merhav: A measure of relative entropy between individual sequences with application to 

universal classification. IEEE Transactions on Information Theory 39, 4 (1993), pp. 1270-1279. 



